Chapter 1: Sample Space and Probability


1.1 Sets


1.2 Probabilistic Models

1.2.1 Sample Space and Event

1.2.2 Choose Appropriate Sample Space

1.2.3 Sequential Model

1.2.4 Axioms of Probability(非负性、可加性、归一性)

1.2.5 Discrete Model

1.2.6 Continuous Model

1.2.7 Properties of Probability

1.2.8 Model and Reality


1.3 Conditional Probability

"How should you update probability/beliefs/uncertainty based on new evidence?"
"Conditioning is the soul of statistics" - Blitzstein

1.3.1 Conditional Probability Law

1.3.2 Conditional to Unconditional in Sequential Model


1.4 Total Probability Theorem & Bayes' Rule


1.5 Independence

1.5.1 Conditional Independence

1.5.2 Independence of a series of events

1.5.4 Independent Experiment & Binomial Probability


1.6 Counting

1.6.1 Multiplication Rule

1.6.2 Permutation

1.6.3 Combination

  order matters order doesn't matter
replace nkn^k (n+k1k){n+k-1 \choose k}
don't replace n!(nk)!\dfrac{n!}{(n-k)!} (nk){n \choose k}

1.6.4 Segmentation

Chapter 2 Discrete Random Variables


2.1 Basic Concepts


2.2 Distributions

Random Variable
XX
Definition Probability Mass Function
pX(k), P(X=k)p_X(k),\ P(X=k)
Expectation
E[X]=xxpX(x)E[X]=\sum_xxp_X(x)
Variance
var(X)=E[X2](E[X])2var(X)=E[X^2]-(E[X])^2
Moment Generating Function
MX(s)=E[esX]M_X(s)=E[e^{sX}]
Graph of PMF Graph of CDF
Discrete Uniform
XUnif{a,b}X \sim Unif\{a,b\}
a finite number of values are equally likely to be observed
(e.g. dice)
pX(k)={1ba+1, if k=a,a+1,,b0, otherwisep_X(k)= \begin{cases} \dfrac{1}{b-a+1} &, \ if\ k=a,a+1,\dots,b \\ 0 &, \ otherwise \end{cases} a+b2\dfrac{a+b}{2} (ba)(ba+2)12\dfrac{(b-a)(b-a+2)}{12} MX(s)=easba+1e(ba+1)s1es1M_X(s)=\dfrac{e^{as}}{b-a+1} \cdot \dfrac{e^{(b-a+1)s}-1}{e^s-1}
Bernoulli
XBern(p)X\sim Bern(p)
XX has only 2 possible value, 0 and 1 pX(k)={p, if k=11p, if k=0p_X(k)= \begin{cases} p &, \ if\ k=1 \\ 1-p &, \ if\ k=0 \end{cases} pp p(1p)p(1-p) MX(s)=1p+pesM_X(s)=1-p+pe^{s}
Binomial
XBin(n,p)X\sim Bin(n,p)
XX is #successes in nn independent Bern(p) trials pX(k)=(nk)pk(1p)nk, k=0,1,,np_X(k)= {n \choose k}p^k(1-p)^{n-k} ,\ k=0,1,\dots,n npnp np(1p)np(1-p) MX(s)=(1p+pes)nM_X(s)=(1-p+pe^{s})^n
Multinomial
XMult(n,p)\vec{X} \sim Mult(n,\vec{p})
X=(X1,,Xk)\vec{X} = (X_1,\dots,X_k)
p=(p1,,pk)\vec{p} = (p_1,\dots,p_k)
nn objects, independently putting into kk categories, pjp_j=P(category jj), XjX_j is #objects in category jj Joint PMF:
P(X1=n1,,Xk=nk)=n!n1!nk!p1n1pknkP(X_1=n_1,\dots,X_k=n_k)=\dfrac{n!}{n_1! \cdots n_k!}p_1^{n_1} \cdots p_k^{n_k} , for n1++nk=n,\ for\ n_1+\cdots+n_k=n
E[Xi]=npiE[X_i]=np_i var(Xi)=npi(1pi)var(X_i)=np_i(1-p_i) MX(s)=(i=1kpiesi)nM_X(\vec{s}) = \bigg( \sum_{i=1}^k p_i e^{s_i} \bigg)^n
Hypergeometric
XHypergeo(w,b,n)X \sim Hypergeo(w,b,n)
XX is #white balls in nn draws without replacement from a population of size N=w+bN=w+b that contains ww white balls and bb black balls pX(k)=(wk)(bnk)(w+bn), 0kwp_X(k) = \dfrac{{w \choose k}{b \choose n-k}}{w+b \choose n},\ 0 \leq k \leq w nww+bn\dfrac{w}{w+b} NnN1nwNbN\dfrac{N-n}{N-1} n\dfrac{w}{N}\dfrac{b}{N}
Geometric
XGeo(p)X\sim Geo(p)
indepedent Bern(p) trials, keep failing until suceed at the kkth trial pX(k)=(1p)k1p, k=1,2,p_X(k)= (1-p)^{k-1}p,\ k=1,2,\dots 1p\dfrac{1}{p} 1pp2\dfrac{1-p}{p^2} MX(s)=pes1(1p)esM_X(s)=\dfrac{pe^s}{1-(1-p)e^s}
Poisson
XPois(λ)X\sim Pois(\lambda)
Bin(n,p) converges to Pois(λ\lambda) as n,p0n\rightarrow \infty, p \rightarrow 0 pX(k)=eλλkk!, k=0,1,2,p_X(k)= e^{-\lambda}\dfrac{\lambda^k}{k!},\ k=0,1,2,\dots λ\lambda λ\lambda MX(s)=eλ(es1)M_X(s)=e^{\lambda(e^s-1)}
Negative Binomial
XNB(r,p)X\sim NB(r,p)
XX is #failures before rrth success pX(k)=(k+r1k)pk(1p)r, k=0,1,p_X(k)= {k+r-1 \choose k}p^k(1-p)^r ,\ k=0,1,\dots pr1p\dfrac{pr}{1-p} pr(1p)2\dfrac{pr}{(1-p)^2} MX(s)=(1p1pes)rM_X(s) = \bigg( \dfrac{1-p}{1-pe^s} \bigg)^r
Properties of Multinomial Distribution

(a) Its marginal distribution is Binomial: XMult(n,p), XjMult(n,pj)\vec{X} \sim Mult(n, \vec{p}),\ X_j \sim Mult(n, p_j)
(b) Lumping property: for example X=(X1,X2,,X10)Mult(n,(p1,,p10))\vec{X}=(X_1, X_2, \dots, X_{10}) \sim Mult(n, (p_1, \dots, p_{10})) Let Y=(X1,X2,(X3++X10))Let\ \vec{Y}=(X_1, X_2, (X_3+\cdots+X_{10})) Then, YMult(n,(p1,p2,(p3++p10)))Then,\ \vec{Y} \sim Mult(n, (p_1,p_2,(p_3+\cdots+p_{10})))
(c) Conditional joint distribution: for example XMult(n,p), and given X1=n1\vec{X} \sim Mult(n, \vec{p}),\ and\ given\ X_1=n_1 (X2,,Xk)Mult(nn1,(p2,,pk))(X_2,\dots,X_k) \sim Mult(n-n_1, (p_2',\dots,p_k')) where pj=pjp2++pkwhere\ p_j' = \dfrac{p_j}{p_2+\cdots+p_k}

Variance of Hypergeometric Distribution

XHypergeo(w,b,n), p=ww+b, N=w+bX \sim Hypergeo(w,b,n),\ p=\dfrac{w}{w+b},\ N=w+b var(X)=var(j=1nXj)=j=1nvar(Xj)+2i<jcov(Xi,Xj)=nvar(X1)+2(n2)cov(X1,X2)=nvar(X1)+2(n2)(E[X1X2]E[X1]E[X2])=np(1p)+2n(n1)2!(ww+bw1w+b1p2)=NnN1finite population correctionnwNbNnp(1p)\begin{aligned} var(X) = var(\sum_{j=1}^n X_j) &= \sum_{j=1}^n var(X_j) + 2 \sum_{i < j} cov(X_i,X_j) \\ &= n var(X_1) + 2 {n \choose 2} cov(X_1,X_2) \\ &= n var(X_1) + 2 {n \choose 2} (E[X_1X_2]-E[X_1]E[X_2]) \\ &= np(1-p) + 2 \dfrac{n(n-1)}{2!} \bigg( \dfrac{w}{w+b}\dfrac{w-1}{w+b-1} - p^2 \bigg) \\ &= \underbrace{\dfrac{N-n}{N-1}}_{finite\ population\ correction} \underbrace{n \dfrac{w}{N} \dfrac{b}{N}}_{np(1-p)} \end{aligned}
Extreme cases:
(1) n=1, Hypergeo(w,b,1)Bern(p),var(X)=p(1p)(1)\ n=1,\ Hypergeo(w,b,1) \to Bern(p), var(X)=p(1-p)
(2) N much larger than n, NnN11,Hypergeo(w,b,n)Bin(n,p), var(X)np(1p)(2)\ N\ much\ larger\ than\ n,\ \dfrac{N-n}{N-1} \to 1, Hypergeo(w,b,n) \to Bin(n,p),\ var(X) \approx np(1-p)

Poisson Paradigm

Proof that Bin(n,p) converged to Pois(λ\lambda) as n,p0,constant λ=np (p=λn)n\rightarrow \infty, p \rightarrow 0, constant\ \lambda = np\ (p=\dfrac{\lambda}{n}):
pX(k)=n!(nk)!k!pk(1p)nk=n(n1)(nk+1)nkλkk!(1λn)nk\begin{aligned} p_X(k) &= \dfrac{n!}{(n-k)!k!}p^k(1-p)^{n-k} \\ &= \dfrac{n(n-1) \cdots (n-k+1)}{n^k} \cdot \dfrac{\lambda^k}{k!} \cdot (1-\dfrac{\lambda}{n})^{n-k} \end{aligned} Let k fixed, j=1,2,,k, n, nk+jn1, (1λn)k1, (1λn)neλLet\ k\ fixed,\ j=1,2,\dots,k,\ \because n \rightarrow \infty,\ \therefore \dfrac{n-k+j}{n} \rightarrow 1,\ (1-\dfrac{\lambda}{n})^{-k} \rightarrow 1,\ (1-\dfrac{\lambda}{n})^{n} \rightarrow e^{-\lambda} pX(k)eλλkk!p_X(k) \rightarrow e^{-\lambda} \dfrac{\lambda^k}{k!}


2.3 Functions of Random Variables


2.4 Expectation, Mean, and Variance

2.4.1 Variance, Moment and Law of the Unconscious Statistician(LOTUS)

2.4.2 Properties of Mean and Variance


2.5 Joint PMFs of Multiple Random Variables

2.5.1 Functions of Multiple Random Variables

2.5.2 More than Two Random Variables


2.6 Conditioning

2.6.1 Given Some Event Occurred

2.6.2 Given Value of Another Random Variable

2.6.3 Conditional Expectation


2.7 Independence

2.7.1 Independence between Random Variable and Event

2.7.2 Independence between Two Random Variables

2.7.3 Independence among Multiple Random Variables

2.7.4 Variance of Sum of Multiple Independent Variables

Chapter 3 General Random Variables


3.1 Continuous Random Variables

Random Variable
XX
Definition Probability Density Function
fX(x)f_X(x)
Cumulative Distribution Function
FX(x)F_X(x)
Expectation
E[X]=xfX(x)dxE[X]=\int_{-\infty}^{\infty} xf_X(x)dx
Variance
var(X)=E[X2](E[X])2var(X)=E[X^2]-(E[X])^2
Moment Generating Function
MX(s)=E[esX]M_X(s)=E[e^{sX}]
Graph of PDF Graph of CDF
Continuous Uniform
XUnif(a,b)X\sim Unif(a,b)
An experiment where there is an arbitrary outcome that lies between certain bounds fX(x)={1ba, if axb0, otherwisef_X(x)= \begin{cases} \dfrac{1}{b-a} &, \ if\ a \leq x \leq b \\ 0 &, \ otherwise \end{cases} FX(x)=xabaF_X(x)=\dfrac{x-a}{b-a} a+b2\dfrac{a+b}{2} (ba)212\dfrac{(b-a)^2}{12} MX(s)=1baesbesasM_X(s)=\dfrac{1}{b-a} \cdot \dfrac{e^{sb}-e^{sa}}{s}
Beta
XBeta(α,β),α>0,β>0X\sim Beta(\alpha,\beta),\alpha>0,\beta>0
The conjugate prior probability distribution for BernBern, BinBin, NBNB, GeoGeo in Bayesian inference; a suitable model for the random behavior of percentages and proportions fX(x)=1B(α,β)xα1(1x)β1, if 0x1f_X(x) = \dfrac{1}{B(\alpha,\beta)}x^{\alpha-1}(1-x)^{\beta-1},\ if\ 0 \leq x \leq 1 where B(α,β)=Γ(α)Γ(β)Γ(α+β)where\ B(\alpha,\beta) = \dfrac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} αα+β\dfrac{\alpha}{\alpha+\beta} αβ(α+β)2(α+β+1)\dfrac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}
Exponential
XExp(λ)X\sim Exp(\lambda)
The time between events in a Poisson process (in which events occur continuously and independently at a constant average rate) fX(x)=λeλx, if x0f_X(x)= \lambda e^{-\lambda x} , \ if\ x \geq 0 FX(x)=1eλxF_X(x)=1-e^{-\lambda x} 1λ\dfrac{1}{\lambda} 1λ2\dfrac{1}{\lambda^2} MX(s)=λλs, s<λM_X(s)=\dfrac{\lambda}{\lambda-s},\ s<\lambda
Laplace
ZLaplace(0,λ1)Z \sim Laplace(0, \lambda^{-1})
Z=XYZ=X-Y, where X,Yi.i.d.Exp(λ)X,Y \stackrel{i.i.d.}{\sim} Exp(\lambda) fZ(z)=λ2eλzf_Z(z) = \dfrac{\lambda}{2} e^{-\lambda \lvert z \rvert} FZ(z)={112eλz, z012eλz, z<0F_Z(z) = \begin{cases} 1- \dfrac{1}{2}e^{-\lambda z} &,\ z \geq 0 \\ \dfrac{1}{2}e^{\lambda z} &,\ z<0 \end{cases} 00 2λ2\dfrac{2}{\lambda^2}
Gamma
XGamma(k,λ)X \sim Gamma(k, \lambda)
kkth arrival time in Poisson process, sum of i.i.d. Exp(λ)Exp(\lambda) fX(x)=1Γ(k)(λx)keλx1xf_X(x) = \dfrac{1}{\Gamma(k)} (\lambda x)^k e^{-\lambda x} \dfrac{1}{x} where Γ(k)=0xkexdxxwhere\ \Gamma(k)=\int_0^{\infty} x^k e^{-x} \dfrac{dx}{x} kλ\dfrac{k}{\lambda} kλ2\dfrac{k}{\lambda^2} MX(s)=1(1sλ)k, s<λM_X(s) = \dfrac{1}{\bigg( 1-\dfrac{s}{\lambda} \bigg)^k},\ s<\lambda
Normal
XN(μ,σ)X\sim N(\mu,\sigma)
The sum of a large number of independent and identically distributed r.v.'s has an approximately normal CDF fX(x)=1σ2πe(xμ)22σ2f_X(x)=\dfrac{1}{\sigma \sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}} FX(x)=1σ2πxe(tμ)22σ2dtF_X(x)=\dfrac{1}{\sigma \sqrt{2\pi}} \int_{-\infty}^x e^{\frac{-(t-\mu)^2}{2\sigma^2}} dt μ\mu σ2\sigma^2 MX(s)=e(σ2s2/2)+μsM_X(s)=e^{(\sigma^2s^2/2)+\mu s}
Standard Normal
ZN(0,1)Z\sim N(0,1)
Z=XμσZ=\dfrac{X-\mu}{\sigma} fZ(z)=12πez2/2f_Z(z)=\dfrac{1}{\sqrt{2\pi}}e^{-z^2/2} Φ(x)=12πxet2/2dt\Phi(x)=\dfrac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-t^2/2} dt 00 11 MX(s)=es2/2M_X(s)=e^{s^2/2}
Cauchy
TCauchy(0,1)T \sim Cauchy(0,1)
T=Z1Z2T=\dfrac{Z_1}{Z_2}, where Z1,Z2i.i.d.N(0,1)Z_1,Z_2 \stackrel{i.i.d.}{\sim} N(0,1) fT(t)=1π(1+t2)f_T(t) = \dfrac{1}{\pi(1+t^2)} FT(t)=1πarctan(t)+12F_T(t) = \dfrac{1}{\pi} \arctan(t) + \dfrac{1}{2} undefined undefined does not exist
Chi-Squared
Vχ2(k)V \sim \chi^2(k)
V=i=1kZi2V=\sum\limits_{i=1}^k Z_i^2, where Zii.i.d.N(0,1)Z_i \stackrel{i.i.d.}{\sim} N(0,1), χ2(k)\chi^2(k) is Gamma(k2,12)Gamma(\dfrac{k}{2},\dfrac{1}{2}) fV(v)=12k/2Γ(k/2)vk/2ev/21vf_V(v) = \dfrac{1}{2^{k/2}\Gamma(k/2)} v^{k/2} e^{-v/2} \dfrac{1}{v} kk 2k2k
Student's tt
Tn=ZV/nT_n=\dfrac{Z}{\sqrt{V/n}}, where ZN(0,1),Vχ2(n)Z \sim N(0,1),V \sim \chi^2(n) fTn(t)=Γ(n+12)nπΓ(n2)(1+t2n)n+12f_{T_n}(t) = \dfrac{\Gamma(\frac{n+1}{2})}{\sqrt{n\pi}\Gamma(\frac{n}{2})}(1+\dfrac{t^2}{n})^{-\frac{n+1}{2}} {0, n>1undefined, otherwise\begin{cases} 0 &,\ n>1 \\ undefined &,\ otherwise \end{cases} {nn2, n>2, 1<n2undefined, otherwise\begin{cases} \dfrac{n}{n-2} &,\ n>2 \\ \infty &,\ 1 < n \leq 2 \\ undefined &,\ otherwise \end{cases} undefined

3.2 Cumulative Distribution Functions (CDF)


3.3 Normal Random Variables


3.4 Joint PDFs of Multiple Random Variables

3.4.1 Joint CDFs

3.4.2 Expectation

3.4.3 More than Two Random Variables


3.5 Conditioning

3.5.1 Conditioning a Random Variable on an Event

3.5.2 Conditioning one Random Variable on Another

3.5.3 Conditional Expectation

3.5.4 Independence


3.6 The Continuous Bayes' Rule

Chapter 4 Further Topics on Random Variables


4.1 Derived Distributions

4.1.1 The Linear Case

4.1.2 The Monotonic Case

4.1.3 Multiple Random Variables

4.1.4 Simulation

4.1.5 Sum of Independent Random Variables - Convolution

X,YX,Y are independent r.v.'s, Z=X+YZ=X+Y


4.2 Covariance and Correlation


4.3 Conditional Expectation and Variance Reavisited

4.3.1 Conditional Expectation as an Estimator

4.3.2 Conditional Variance


4.4 Transforms (Moment-Generating Function)

4.4.1 From Transforms(MGFs) to Moments

4.4.2 Inversion of Transforms

4.4.3 Sum of Independent Random Variables - MGF (Another way: 4.1.5 Convolution)

4.4.4 Transfroms Associated with Joint Distributions


4.5 Sum of a Random Number of Independent Random Variables


4.6 Beta and Gamma

Chapter 5 Limit Theorems


5.1 Markov and Chebyshev Inequalities


5.2 The Weak Law of Large Numbers (WLLN)


5.3 Convergence in Probability


5.4 Central Limit Theorem

5.4.1 Approximations Based on the CLT

5.4.2 De Moivre-Laplace Approximation to the Binomial


5.5 The Strong Law of Large Numbers

Chapter 6 The Bernoulli and Poisson Processes


6.1 The Bernoulli Process

6.1.1 Independence, Memorylessness and Fresh-Start Properties

6.1.2 Interarrival Times

6.1.3 Time of kkth Success/Arrival

6.1.4 Splitting and Merging of Bernoulli Processes


6.2 The Poisson Process

6.2.1 Number of Arrivals in an Interval

6.2.2 Independence and Memorylessness

6.2.3 Interarrival Times

6.2.4 The kkth Arrival Time

6.2.5 Splitting and Merging of Poisson Processes

  1λ1δ1-\lambda_1\delta 00 λ1δ\lambda_1\delta 11 O(δ2)O(\delta^2) 2\geq 2
1λ2δ1-\lambda_2\delta       00 1(λ1+λ2)δ+O(δ2)1-(\lambda_1+\lambda_2)\delta+O(\delta^2) λ1δ+O(δ2)\lambda_1\delta+O(\delta^2) 00
λ2δ\lambda_2\delta              11 λ2δ+O(δ2)\lambda_2\delta+O(\delta^2) O(δ2)O(\delta^2) 00
O(δ2)O(\delta^2)      2\geq 2 00 00 00

Example:
P(originate from 1st processarrival at time t)=λ1λ1+λ2P(originate\ from\ 1st\ process | arrival\ at\ time\ t) = \dfrac{\lambda_1}{\lambda_1+\lambda_2}
P(kth arrival originates from 1st process)=λ1λ1+λ2P(kth\ arrival\ originates\ from\ 1st\ process) = \dfrac{\lambda_1}{\lambda_1+\lambda_2}
P(4 out of first 10 arrivals originate from 1st process)=(104)(λ1λ1+λ2)4(λ2λ1+λ2)6P(4\ out\ of\ first\ 10\ arrivals\ originate\ from\ 1st\ process) = {10 \choose 4} \bigg( \dfrac{\lambda_1}{\lambda_1+\lambda_2} \bigg)^4 \bigg( \dfrac{\lambda_2}{\lambda_1+\lambda_2} \bigg)^6

6.2.6 Bernoulli and Poisson Processes, and Sums of Random Variables

K=#arrivals in merged process till get one from process ν=NT+1K = \#arrivals\ in\ merged\ process\ till\ get\ one\ from\ process\ \nu=N_T+1 P(arrival originating from process ν)=νλ+νP(arrival\ originating\ from\ process\ \nu)=\dfrac{\nu}{\lambda+\nu} KGeo(νλ+ν), pK(k)=(λλ+ν)k1(νλ+ν)K \sim Geo \bigg( \dfrac{\nu}{\lambda+\nu} \bigg),\ p_K(k)=\bigg( \dfrac{\lambda}{\lambda+\nu} \bigg)^{k-1} \bigg( \dfrac{\nu}{\lambda+\nu} \bigg) pNT(n)=pK(n+1)=(λλ+ν)n(νλ+ν)p_{N_T}(n)=p_K(n+1)=\bigg( \dfrac{\lambda}{\lambda+\nu} \bigg)^{n} \bigg( \dfrac{\nu}{\lambda+\nu} \bigg)

6.2.7 The Random Incidence Paradox

L=(tU)+(Vt)L=(t^*-U)+(V-t^*) (tU)Exp(λ), (Vt)Exp(λ)(t^*-U) \sim Exp(\lambda),\ (V-t^*) \sim Exp(\lambda) LErlang(2,λ), E[L]=2/λL \sim Erlang(2, \lambda),\ E[L]=2/\lambda
An observer who arrive at an arbitrary time is more likely to fall in a large rather than a small interarrival interval. As a consequence the expected length seen by the observer is higher: 2/λ2/\lambda compared with the 1/λ1/\lambda mean of the exponential PDF

6.2.8 Poisson vs. Normal Approximations of the Binomial

Chapter 7 Markov Chains


7.1 Discrete-Time Markov Chains

7.1.1 The Probability of a Path

7.1.2 nn-Step Transition Probabilities


7.2 Classification of States


7.3 Steady-State Behavior

7.3.1 Long-Term Frequency Interpretations

7.3.2 Birth-Death Processes


7.4 Absorption Probabilities and Expected Time to Absorption

7.4.1 Expected Time to Absorption

7.4.2 Mean First Passage and Recurrence Times


7.5 Continuous-Time Markov Chains

7.5.1 Approximation by a Discrete-Time Markov Chain

7.5.2 Steady-State Behavior

7.5.3 Birth-Death Processes

Chapter 8 Bayesian Statistical Inference

Unknown parameters are viewed as random variables with a single, fully-specified probabilistic model


8.1 Bayesian Inference and the Posterior Distribution


8.2 Point Estimation, Hypothesis Testing, and the MAP Rule

8.2.1 Point Estimation

8.2.2 Hypothesis Testing


8.3 Bayesian Least Mean Squares (LMS) Estimation

8.3.1 Some Properties of the Estimation Error

Θ^=E[ΘX], Θ~=Θ^Θ\hat{\Theta} = E[\Theta|X],\ \tilde{\Theta} = \hat{\Theta} - \Theta

8.3.2 The Case of Multiple Observations and Multiple Parameters


8.4 Bayesian Linear Least Mean Squares (LLMS) Estimation

8.4.1 LLMS Estimation Based on a Single Observation

8.4.2 The Case of Multiple Observations and Multiple Parameters

8.4.3 Linear Estimation and Normal Models

8.4.4 The Choice of Variables in Linear Estimation


8.5 Summary

Estimator
Θ^=g(X)\hat{\Theta} = g(X)
Characteristics
MAP θ^=argmaxθpΘX(θx), Θ discrete\hat{\theta} = \arg\max_{\theta} p_{\Theta \vert X}(\theta \vert x),\ \Theta\ discrete θ^=argmaxθfΘX(θx), Θ continuous\hat{\theta} = \arg\max_{\theta} f_{\Theta \vert X}(\theta \vert x),\ \Theta\ continuous minimize the unconditional probability of error and the conditional one, given any observation value xx
LMS Θ^=E[ΘX]\hat{\Theta} = E[\Theta \vert X] minimize MSE=E[(ΘE[ΘX])2]=var(ΘX)MSE = E[(\Theta-E[\Theta \vert X])^2] = var(\Theta \vert X)
LLMS Θ^=E[Θ]+cov(Θ,X)var(X)(XE[X])=E[Θ]+ρσΘσX(XE[X])\begin{aligned} \hat{\Theta} &= E[\Theta] + \dfrac{cov(\Theta,X)}{var(X)}(X-E[X]) \\ &= E[\Theta] + \rho \dfrac{\sigma_{\Theta}}{\sigma_X}(X-E[X]) \end{aligned} simple calculation, minimize MSE based on linear function of observation, where MSE=(1ρ2)σΘ2MSE = (1-\rho^2)\sigma_{\Theta}^2

Chapter 9 Classical Statistical Inference

Unknown parameters are viewed as constants to be determined. each possible value of the unknown parameter has a separated probabilistic model


9.1 Classical Parameter Estimation

9.1.1 Properties of Estimators

9.1.2 Maximum Likelihood Estimation (ML)

9.1.3 Estimation of the Mean and Variance of a Random Variable

9.1.4 Confidence Intervals

9.1.5 Confidence Intervals Based on Estimator Variance Approximations


9.2 Linear Regression

9.2.1 Justification of the Least Squares Formulation

9.2.2 Bayesian Linear Regression

9.2.3 Multiple Linear Regression

9.2.4 Nonlinear Regression

9.2.5 Practical Considerations


9.3 Binary Hypothesis Testing


9.4 Significance Testing

9.4.1 The General Approach

9.4.2 Generalized Likelihood Ratio and Goodness of Fit Tests